5 research outputs found

    Evaluating the impact of voice activity detection on speech emotion recognition for autistic children

    Get PDF
    Individuals with autism are known to face challenges with emotion regulation, and express their affective states in a variety of ways. With this in mind, an increasing amount of research on automatic affect recognition from speech and other modalities has recently been presented to assist and provide support, as well as to improve understanding of autistic individuals' behaviours. As well as the emotion expressed from the voice, for autistic children the dynamics of verbal speech can be inconsistent and vary greatly amongst individuals. The current contribution outlines a voice activity detection (VAD) system specifically adapted to autistic children's vocalisations. The presented VAD system is a recurrent neural network (RNN) with long short-term memory (LSTM) cells. It is trained on 130 acoustic Low-Level Descriptors (LLDs) extracted from more than 17 h of audio recordings, which were richly annotated by experts in terms of perceived emotion as well as occurrence and type of vocalisations. The data consist of 25 English-speaking autistic children undertaking a structured, partly robot-assisted emotion-training activity and was collected as part of the DE-ENIGMA project. The VAD system is further utilised as a preprocessing step for a continuous speech emotion recognition (SER) task aiming to minimise the effects of potential confounding information, such as noise, silence, or non-child vocalisation. Its impact on the SER performance is compared to the impact of other VAD systems, including a general VAD system trained from the same data set, an out-of-the-box Web Real-Time Communication (WebRTC) VAD system, as well as the expert annotations. Our experiments show that the child VAD system achieves a lower performance than our general VAD system, trained under identical conditions, as we obtain receiver operating characteristic area under the curve (ROC-AUC) metrics of 0.662 and 0.850, respectively. The SER results show varying performances across valence and arousal depending on the utilised VAD system with a maximum concordance correlation coefficient (CCC) of 0.263 and a minimum root mean square error (RMSE) of 0.107. Although the performance of the SER models is generally low, the child VAD system can lead to slightly improved results compared to other VAD systems and in particular the VAD-less baseline, supporting the hypothesised importance of child VAD systems in the discussed context

    Non-participatory user-centered design of accessible teacher-teleoperated robot and tablets for minimally verbal autistic children

    Get PDF
    Autistic children with limited language ability are an important but overlooked community. We develop a teacher-teleoperated robot and tablet system, as well as learning activities, to help teach facial emotions to minimally verbal autistic children. We then conduct user studies with 31 UK and Serbia minimally verbal autistic children to evaluate the system's accessibility. Results showed minimally verbal autistic children could use the tablet interface to control or respond to a humanoid robot and could understand the face learning activities. We found that a flexible and powerful wizard-of-oz tablet interface respected the needs of the children and their teachers. Our work suggests that a non-participatory, user-centered design process can create a robot and tablet system that is accessible to many autistic children

    Dialogue Design for a Robot-Based Face-Mirroring Game to Engage Autistic Children with Emotional Expressions

    No full text
    We present design strategies for Human Robot Interaction for school-aged autistic children with limited receptive language. Applying these strategies to the DE-ENIGMA project (large EU project addressing emotion recognition in autistic children) supported development of a new activity for in facial expression imitation whereby the robot imitates the child's face to encourage the child to notice facial expressions in a play-based game. A usability case study with 15 typically-developing children aged 4--6 at an English-language school in the Netherlands was performed to observe the feasibility of the setup and make design revisions before exposing the robot to autistic children
    corecore